منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملEntity resolution for probabilistic data
Entity resolution is the problem of identifying the tuples that represent the same real world entity. In this paper, we address the problem of entity resolution over probabilistic data (ERPD), which arises in many applications that have to deal with probabilistic data. To deal with the ERPD problem, we distinguish between two classes of similarity functions, i.e. context-free and context-sensit...
متن کاملDisinformation Techniques for Entity Resolution
We study the problem of disinformation. We assume that an “agent” has some sensitive information that the “adversary” is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent’s goal is to disseminate false information to “dilute” what is known ...
متن کاملEntity Resolution for Uncertain Data
Entity resolution (ER), also known as duplicate detection or record matching, is the problem of identifying the tuples that represent the same real world entity. In this paper, we address the problem of ER for uncertain data, which we call ERUD. We propose two different approaches for the ERUD problem based on two classes of similarity functions, i.e. context-free and context-sensitive. We prop...
متن کاملCrowdsourcing Algorithms for Entity Resolution
In this paper, we study a hybrid human-machine approach for solving the problem of Entity Resolution (ER). The goal of ER is to identify all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Our input is a graph over all the records in a database, where each edge has a probability denoting our prior belief (based on Machine Learning mode...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Systems
سال: 2019
ISSN: 0306-4379
DOI: 10.1016/j.is.2019.06.003